FitBit Tracker Exploratory Data Analysis

Notes:

The dataset is from https://www.kaggle.com/datasets/arashnic/fitbit under the CC0: Public Domain license
The data consist of 18 csv files and contain data for 33 participants over a period of one month from 4/12/2016 to 5/12/2016

Data Preparation

Preliminary screening of data:

Data Cleaning:

Univariate Analysis

Descriptive statistics for the dailyActivity dataframe

The mean number steps taken per day is about 7638. According to MedicalNewsToday, doctors generally consider taking 5000 steps per day to be "sedentary". A 2020 study found that participants who took 8,000 steps per day had a 51% lower riskTrusted Source of dying by any cause. The histogram of TotalSteps shows a slight right skew and a significant number of entries with 0-499 steps which is undesirable. One idea could be to notify users to pay attention to steps taken at the beginning of the day.

Histograms for daily activity metrics

The distribution of TotalSteps is somewhat right skewed with a spike of entries below 500 steps.
The distribution for Calories seems to be bimodal with a peak around 2000 calories and second peak around 2700 calories.
The distribution for VeryActiveMintues and FairlyActiveMintues are extremely right skewed.
The distribution for LightlyActiveMinutes seems unimodal except for a spike of entries below 20 minutes.
The distribution for SedentaryMinutes seems to be bimodal and left skewed with an additional spike for entries above 1400 minutes.

There is a spike in entries both for the low end of LightlyActiveMinutes and the high end of SedentaryMinutes suggesting there are many days where people are minimally active. One goal to implement could be to nudge users to get past this threshold of minimal activity each day.

Calculating the percentage of entries with less than 5000 steps.

About 32% of entries are considered to be sedentary. One improvement could be to encourage people to walk at least 5000 steps a day with notifications.

Grouping dailyActivity by "Id" and aggregating the columns by mean. The resulting dataframe contains the daily averages of each column for each participant. We could use the averages for each individual to represent the overall fitness characteristics for each individual.

Grouping hourlyCalories and hourlySteps by hour in the day

Bar Chart showing mean hourly calories burnt by hour of day

The graph follows intuition in that most calories are burn during the day between 8am and 8pm where the mean calories burnt are over 100

Bar chart showing mean steps taken by hour of day

This chart also makes intuitive sense where most of the steps are taken during the day.

Descriptive statistics for the sleepDay dataframe

The mean of total minutes asleep is around 419 translates to right under 7 hours of sleep per night, which according to Mayo Clinic is right at the recommended amount of sleep for adults.

Calculating the percentage of records with less than 7 hours of sleep

Around 44% of entries have under the recommended 7 hours of sleep which is less than ideal.

Grouping sleepDay by "Id" and aggregating the columns by mean. The resulting dataframe contains the daily averages for each individual.

Grouping sleepDay by "SleepDay" and aggregating the columns by mean. The resulting dataframe contains the average sleep measurements by the date.

Bivariate Analysis

Scatterplots of different levels of active minutes with total minutes asleep in that day.

For VeryActiveMinutes, FairlyActiveMinutes, and Lightly Active Minutes there is no clear association. However, there is a visibly negative correlation between SedentaryMinutes and TotalMinutesAsleep. In the higher end of SedentaryMinutes, there is likely a mutual exclusivity with TotalMinutesAsleep because the number of SedentaryMinutes is associated with time spent awake. Regardless, there is a noticeable negative correlation even before SedentaryMinutes might inherently cut into TotalMinutesAsleep.

Correlation Heatmap of merged_activity_sleep

We can see that SedentaryMinutes is negatively correlated with TotalMinutesAsleep and the distance, active minutes, and calories are positively correlated with each other which makes sense.

Summary

We were able to identify some trends in the daily activity of users by looking at metrics such as the number of steps taken and the number of minutes in differing intensities of activity. We were then able to identify some lacking areas in user daily activity based on outside research in the healthcare field and make suggestions for how FitBit could better target and implement fitness goals to promote better health overall for users. For example, urging users to achieve a minimum threshold of activity and reminding users to take enough steps and get enough sleep.

There were no metrics to identify user characteristics besides fitness activity, so we could not dig deeper into the characteristics of the users. Many of the data files lacked documentation for units of measurement and structure. There was also a lack of documentation for these files from the data source. So we attempted analysis on only the data with intuitive measurements in the column name.